Customization of the Europarl Corpus for Translation Studies
نویسندگان
چکیده
Currently, the area of translation studies lacks corpora by which translation scholars can validate their theoretical claims, for example, regarding the scope of the characteristics of the translation relation. In this paper, we describe a customized resource in the area of translation studies that mainly addresses research on the properties of the translation relation. Our experimental results show that the Type-Token-Ratio (TTR) is not a universally valid indicator of the simplification of translation.
منابع مشابه
Unsupervised Syntax-Based Machine Translation: The Contribution of Discontiguous Phrases
We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing that it outperforms the state-of-the-art phrase-based Pharaoh system. We demonstrate that the incl...
متن کاملPhrase-Based Backoff Models for Machine Translation of Highly Inflected Languages
We propose a backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level. The model is evaluated on the Europarl corpus for German-English and FinnishEnglish translation and shows improvements over state-of-the-art phrase-based models.
متن کاملStochastic Inversion Transduction Grammars for Obtaining Word Phrases for Phrase-based Statistical Machine Translation
An important problem that is related to phrase-based statistical translation models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, we propose obtaining word phrases by means of a Stochastic Inversion Translation Grammar. Experiments on the shared task proposed in this workshop with the Europarl corpus have been carried out and good results have been ob...
متن کاملExperiments with Swedish–english Statistical Machine Translation
We have conducted initial experiments with statistical machine translation between English and Swedish based on the Moses toolkit and the Europarl corpus. The main aim was to decrease processing times without harming translation quality by changing the settings in Moses and for the parameter tuning. The experiments show that translation and tuning times can be cut around 10 times without harmin...
متن کاملUsing Parsed Corpora for Estimating Stochastic Inversion Transduction Grammars
An important problem when using Stochastic Inversion Transduction Grammars is their computational cost. More specifically, when dealing with corpora such as Europarl only one iteration of the estimation algorithm becomes prohibitive. In this work, we apply a reduction of the cost by taking profit of the bracketing information in parsed corpora and show machine translation results obtained with ...
متن کامل